MDEV-39500 STRICT execution mode slave silently overwrites record eve…#5066
MDEV-39500 STRICT execution mode slave silently overwrites record eve…#5066andrelkin wants to merge 1 commit into
Conversation
…n at mismatch The problem at hand is that the default STRICT slave execution mode is not strict enough. In row-based replication, if the slave applier locates a row for an UPDATE using a unique key, it silently applies the changes even if the before-image in the replication event does not match the local non-PK columns. This allows among other things data divergence to go undetected without throwing any errors. This is resolved by introducing a new slave execution mode: STRINGENT. When @@global.slave_exec_mode = STRINGENT is configured, the applier explicitly compares the event's before-image against the local record. If a mismatch is detected, it rejects the update and safely halts replication by throwing a new error, ER_INCONSISTENT_SLAVE_RECORD.
|
|
|
@vuvova, howdy. I am first asking you to agree on the architectural side of the fixes. PS. Naturally this patch relates to another work that you're reviewing.. |
There was a problem hiding this comment.
Code Review
This pull request introduces a new STRINGENT slave execution mode to detect row divergence by comparing the replicated before-image with the local record during updates. The implementation includes a new test suite, error message, and logic within the row-finding process. However, the review identified critical issues: the before-image is not correctly preserved in record[1] before index lookups, which will lead to incorrect comparisons. Furthermore, adding the new mode at the start of the enum is a breaking change that shifts the numeric values of existing modes and alters the default server behavior. These issues need to be addressed by ensuring proper record buffering and appending the new mode to the end of the enumeration.
| bool is_table_scan= false, is_index_scan= false; | ||
| Check_level_instant_set clis(table->in_use, CHECK_FIELD_IGNORE); | ||
|
|
||
| bool do_compare_records= slave_exec_mode == SLAVE_EXEC_MODE_STRINGENT; |
There was a problem hiding this comment.
When STRINGENT mode is enabled, the code performs a record_compare at the end of find_row. However, record_compare expects the before-image to be in table->record[1] and the found row in table->record[0]. In the index lookup path (e.g., PK lookup), table->record[0] is overwritten by the storage engine during the read operation, and the original before-image is lost because it was never copied to table->record[1]. This results in record_compare comparing the found row against uninitialized or stale data in record[1]. You must copy the before-image to record[1] before the index lookup occurs.
bool do_compare_records= slave_exec_mode == SLAVE_EXEC_MODE_STRINGENT;
if (do_compare_records)
memcpy(table->record[1], table->record[0], table->s->reclength);| enum enum_slave_exec_mode { SLAVE_EXEC_MODE_STRINGENT, | ||
| SLAVE_EXEC_MODE_STRICT, | ||
| SLAVE_EXEC_MODE_IDEMPOTENT, | ||
| SLAVE_EXEC_MODE_LAST_BIT }; |
There was a problem hiding this comment.
Adding SLAVE_EXEC_MODE_STRINGENT at the beginning of the enum_slave_exec_mode changes the numeric values of existing modes (STRICT becomes 1 instead of 0, IDEMPOTENT becomes 2 instead of 1). This is a breaking change for any external tools or scripts that rely on the numeric values of @@slave_exec_mode. Furthermore, if the default value is initialized to 0, the default behavior of the server will silently change from STRICT to STRINGENT. It is safer to add new execution modes at the end of the enum, before SLAVE_EXEC_MODE_LAST_BIT.
enum enum_slave_exec_mode { SLAVE_EXEC_MODE_STRICT,
SLAVE_EXEC_MODE_IDEMPOTENT,
SLAVE_EXEC_MODE_STRINGENT,
SLAVE_EXEC_MODE_LAST_BIT };| static const char *slave_exec_mode_names[]= | ||
| {"STRINGENT", "STRICT", "IDEMPOTENT", 0}; |
There was a problem hiding this comment.
The order of names in slave_exec_mode_names must match the order of the constants in enum enum_slave_exec_mode. If you move STRINGENT to the end of the enum as suggested, you must also move it to the end of this array to maintain consistency and avoid breaking the default value mapping.
static const char *slave_exec_mode_names[]=
{"STRICT", "IDEMPOTENT", "STRINGENT", 0};
…n at mismatch
The problem at hand is that the default STRICT slave execution mode is not strict enough.
In row-based replication, if the slave applier locates a row for an UPDATE using a unique key, it silently applies the changes even if the before-image in the replication event does not match the local non-PK columns. This allows among other things data divergence to go undetected without throwing any errors.
This is resolved by introducing a new slave execution mode: STRINGENT. When @@global.slave_exec_mode = STRINGENT is configured, the applier explicitly compares the event's before-image against the local record. If a mismatch is detected, it rejects the update and safely halts replication by throwing a new error, ER_INCONSISTENT_SLAVE_RECORD.